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Abstract 

While standard meta-analysis pools the results from randomized trials 
that eompare two treatments, network meta-analysis aggregates the results 
of randomized trials eomparing a wider variety of treatment options. How¬ 
ever, it is unelear whether the aggregation of effeet estimates aeross hetero¬ 
geneous populations will be eonsistent for a meaningful parameter when not 
all treatments are evaluated on eaeh population. Drawing from eounterfae- 
tual theory and the eausal inferenee framework, we define the population of 
interest in a network meta-analysis and define fhe fargef paramefer under a 
series of nonparamefrie sfruefural assumptions. This allows us fo determine 
fhe requiremenfs for idenfifiabilify of fhis paramefer, enabling a deserip- 
fion of fhe eondifions under whieh nefwork mefa-analysis is appropriafe and 
when if mighf mislead deeision making. We fhen adapf several modeling 
sfrafegies from fhe eausal inferenee liferafure fo obfain eonsisfenf estimation 
of fhe inlervenlion-speeifie mean oufeome and model-independenf eonfrasfs 
befween freafmenfs. Finally, we perform a reanalysis of a sysfemafie review 
fo eompare fhe effieaey of anfibiofies on suspeefed or eonfirmed mefhieillin- 
resisfanf Staphylococcus aureus in hospifalized pafienfs. 
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1 Introduction 


While individual studies are rarely used to inform scientifie or medieal deeision 
making (|Slavin[ |1995[), multiple sourees of evidence may be aggregated in order 


to offer more generalizable and precise comparisons between treatments (Lum- 


le^ |2004[ [Salanti et al.j |2008| [Caldwell et al.| [2005[ |Lu and Ades[ |2004| ). Meta¬ 


analysis, which is the statistical synthesis of multiple study results, is often con¬ 
sidered the highest form of quantitative evidence due to its ability to combine all 
relevant information in the scientific literature. However, because of such issues 
as effect heterogeneity across study populations and methodology that does not 
necessarily account for all sources of bias, the status of meta-analysis as the “gold 
standard” of medical knowledge has been questioned (Berlin and Golub 2014| ). 

Standard meta-analysis compares two treatments of interest (or, for instance, 
an active treatment and placebo). When many treatments for a common condition 
are tested and made available over time, the medical literature may then contain 
multiple randomized controlled trials (RCTs) with various treatment comparisons 
on potentially different populations. Without additional guidance, clinicians and 
patients are left to informally synthesize information in the available studies in 
order to determine an optimal treatment decision. A network meta-analysis statis¬ 
tically aggregates the results from the relevant RCTs in order to obtain an estimate 
of the contrast between each pair of treatments. In particular, this type of analysis 
can produce estimates of contrasts even when no RCT directly compared the two 
treatments of interest directly. 

Each RCT in the network may be performed on populations that differ in terms 
of their baseline characteristics. These population-specific variables may affect 
the average response to treatment so that in order to combine inference involv¬ 


ing the means, it might be beneficial to control for such variables (Salanti et al. 


2009). Furthermore, it has been noted that if these characteristics not only dif¬ 


ferentially affect response to treatment, but also the initial study design choice 
of which treatments to compare, then these variables may confound the overall 
effect estimate ([Jansen et al.[ |2012[ [Berlin and Golub[ |2014[). As an example. 


Jansen et al. (2012) suggest that the baseline severity of patients recruited into a 


study can be related to the type of treatments investigated in the study and also af¬ 
fect the average outcome at the end of the study. As we demonstrate in this paper, 
such “study-level confounding” must be adjusted for in order to obtain consistent 
estimation of average treatment effects. 

In this paper, we consider the setting where individual patient data are not 
available so that the observed data is limited to average covariate and outcome 
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values in addition to study-level information (whieh we refer to as “aggregate” or 
study-level data). We begin by deseribing past parametrie approaehes to network 
meta-analysis where the parameter of interest is dependent on the model speei- 
fieation and where the absenee of effeet heterogeneity is often required a priori. 
Using the eounterfaetual framework, we propose a novel definition of a marginal 
and model-independent eausal parameter of interest in network meta-analysis and 
delineate the assumptions required to estimate this parameter in the presenee of 
measured study-level eonfounders. We are then able to elarify eonditions under 
whieh a network meta-analysis is appropriate and when it might mislead deeision 
making regardless of estimation method used. We deseribe several marginal esti¬ 
mation methods adapted from the single study eausal inferenee setting, ineluding 
a doubly robust and semiparametrie loeally effieient Targeted Maximum Likeli¬ 
hood Estimator, and then eompare these methods in a simulation study. Finally, 
we perform a reanalysis of the systematie review by Bally et al. (2012) to eompare 
the effieaey of antibioties on suspeeted or eonfirmed methieillin-resistant Staphy¬ 
lococcus aureus (MRSA) in hospitalized patients. 


2 The observed data 


Eaeh RCT is assumed to randomly sample subjeets from a wider population, 
ealled a superpopulation. Within the RCT, randomization assigns subjeets to two 
or more groups, eaeh one reeeiving a treatment. These groups are often referred 
to as treatment arms. Due to randomization and random sampling, eaeh group 
is a representative sample from the superpopulation. Therefore, eaeh arm ean be 
thought of as a distinet study on the same superpopulation. The superpopula¬ 
tions targeted by the RCTs may differ in terms of their eharaeteristies due to, for 
example, eaeh trial’s physieal and temporal loeation, the individual inelusion and 
exelusion eriteria, and the reeruitment sample size targets. Therefore, if effeet het¬ 
erogeneity exists (i.e. if the relative treatment effeets at the subjeet level depend 
on baseline eovariate values), one would not expeet the average relative treatment 
effeets to neeessarily be equal aeross superpopulations. 

More formally, the superpopulation is the eoneeptual group of essentially in¬ 
finite size from whieh the study sample is seleeted ( Robins [ 19881. A measure of 
some outeome (T) is taken on eaeh subjeet in the RCT arm. In this artiele, we will 
generally eonsider the example where the sample mean and standard deviation of 
Y are the summary statisties eomputed in eaeh RCT. 

Eet A, j be the intervention reeeived by subjeets in arm j of a partieular RCT in- 
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dexed by i. For this arm, we observe an estimated mean outeome Yij and standard 
deviation Sij. Let Oi = {Wi,ni,{Nij,Aij,Yij,Sij};j = 1,where 
Wi is study baseline information and is the number of arms in the study. For the 
7 —th arm of RCT i, let Nij be the number of subjeets and N be the total number 
of RCTs in the sample. 

Beeause we are interested in summarizing effeets aeross multiple superpop¬ 
ulations, we are arguably attempting to estimate effeets in a metapopulation that 
eontains the individual superpopulations from eaeh study. For the purpose of this 
paper, we define the metapopulation as the union of possible study superpopula¬ 
tions and define our parameters of interest with respeet to this metapopulation. In 
partieular, we assume that the individual Oi veetors are independently drawn from 
the metapopulation and identieally distributed. 


3 Past approaches to network meta-analysis 


Standard approaehes in network meta-analysis where only aggregate data are 
observed place a hierarchical model on either the study-specific contrasts (e.g. the 
difference in means, Yu — Ya) or the arm-specific outcomes {Yij) and specify a 
within-study correlation structure (Lu and Ades, 2004 Salanti et al.[ 2008 [ Dias 


et aHpOl'Sal [Zhang et aH|2014|). As the absence of effect heterogeneity is often 


required, a priori ( |Cope et aL| |2014[ ) and post-hoc ( |Lu and Ades[ |2006[ ) inves¬ 
tigation of this assumption is routinely recommended. The reader is referred to 
published guidance ( |Dias et aTj |2013b[ [Jansen et al.[ |2014[ ) and to an example 
of how heterogeneity was accounted for in an economic analysis (Welton et al.[ 


20151. There has been recent heated debate about the appropriateness of arm- 


based estimation methods (Dias et al., 2013a[ Hong et ahlpOlb ). 

The effect targeted in a hierarchical model depends on the contrast-type cho¬ 
sen and the parametrization of the model, and may or may not correspond to a 
marginal effect as we define further on. For binary outcomes, due to the non- 
collapsibility of the logistic regression model (Gail et al., 19841 in particular, ad¬ 
justment for covariates in such a model changes the true value of the “effect” 
parameter being estimated. This type of modeling strategy may therefore be bi¬ 
ased for the estimation of a marginal effect. Even in linear models, the inclusion 
of treatment interactions with covariates can also bias the value of the coefficient 
of treatment relative to the marginal effect. Zhang et al. (2014|) and Zhang et al. 


(2015) take a missing data perspective and model the arm-specific outcomes us- 
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ing a Bayesian hierarchical model to estimate marginal parameters. While neither 
approach has yet been extended to incorporate covariates, the former paper as¬ 
sumes that treatments are applied to studies at-random while the latter allows for 
estimation in a not-at-random context by explicitly specifying the unobservable 
selection mechanism. 


While adjustment for covariates is rare in practice, Jansen et al. (2012) in¬ 
troduced the notion of adapting Pearl’s causal directed acyclic graphs (DAGs) to 
this setting (Pearl 2009) in order to assist in covariate selection. As a general 
rule, Jansen et al. (2012) advocate for the adjustment of all modifiers of the rel¬ 
ative treatment effects across comparisons. They also discourage adjustment for 
covariates that are not effect modifiers due to the fact that they may induce bias in 
the meta-analysis. 


4 The counterfactual approach 

Let 7“ be the potential (or counterfactual) outcome of a random subject drawn 
from the metapopulation had that subject received treatment A = a. In an RCT, 
each study arm produces an estimate of the superpopulation-specific mean of the 
outcome 7“ under the treatment assigned. Let the true mean of the potential out¬ 
come under treatment a for the superpopulation targeted in study i be denoted 
Mf := £'(7“ I Pi) where Pi represents the superpopulation targeted in study i. 
Let := ^JVariY^^ \ Pi) be the standard deviation of the potential outcomes in 
Pi under treatment a. Now suppose that each superpopulation is independently 
drawn from a metapopulation, = [ji^^pPi, the union of all possible study su¬ 
perpopulations indexed by the set Sp. A marginal target parameter in a meta¬ 
analysis is := £(7®) = £(M®), which represents the mean outcome under 
treatment a on the metapopulation. The standard deviation of the overall out¬ 
come distribution is E® := ^/VaHY^ = \/E{Var{Y‘^ \ £,)}-|-yar{£(7® | £,■)} = 

\JE{1L^^) +Var{Mf), representing the within and between study heterogeneity in 
the outcome under treatment. Due to treatment arm randomization and random 
sampling, 7/y is an unbiased estimate of the mean potential outcome un- 

A- • 

der the observed treatment, and Sij is a consistent estimate of E- the potential 
standard deviation under the observed treatment. 

For two treatments, A = a and b, with corresponding means M® and M^, we 
can define a causal effect as the contrast between the mean outcome when the 
entire metapopulation is treated according to one treatment versus another. For 
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instance, for binary outcomes we may define the causal risk difference as M" —M° 
and the causal risk ratio as /M^. 

The patient sample in any given study arm may not be representative of the 
metapopulation, for which the effect of interest is defined. In addition, because 
treatment was not randomly allocated across different RCTs, the collection of 
mean outcomes observed under a given treatment a may not be representative 
of the metapopulation under treatment a. At the design stage, the decision of 
which treatments to include as arms within an RCT may be influenced by the 
characteristics of the superpopulation on which the study is taking place. For 
instance, consider the example of planning a study for a superpopulation with 
higher disease severity from Jansen et al. ( 2012[ ). Studies including patients with 
severe disease are more likely to include an arm with an aggressive treatment. If 
this occurs, the mean outcome under the aggressive treatment may be different 
than in a less severe superpopulation. In this situation, we would say that the 
treatment-mean outcome relationship is confounded at the study level by severity. 


4.1 


A Causal Directed Acyclic Graph (DAG) for network meta¬ 
analysis 


Similar to Alonso et al. (2015), we assume that heterogeneity in the different su¬ 
perpopulations targeted in the individual RCTs implies that each RCT estimates 
a different causal effect. Like [Zhang et'aL] ( |2014| ), we take an “arm-based” ap¬ 
proach to the problem. Like Jansen et'ar] (2012), we draw a causal DAG in order 
to conceptualize the relationship between treatment, study results, and population- 
specific characteristics. We arbitrarily choose to intervene on the arm labeled j in 
each study. We write Ni = {NijJ = 1, ...,n,}, the vector of sample sizes across 
arms. We will also define A, = {AijJ — the vector of treatment assign¬ 

ments evaluated in study i, and to mean the treatment vector excluding some 
arm j. 


Many of the assumptions presented in detail in Section 4.3 are drawn explic¬ 
itly using the study-level DAGs in Figure |l(a)[ The nodes of the DAG represent 
variables measured at the level of the RCT and the arrows between them represent 
the effect of the parent on the child node. For example, the absence of an arrow 
from Ai\^j to Yij,Sij represents a component of the “no interference” assumption 
that the treatment in one arm will not affect the outcome in another. The arrow 
from Nij to Yjj^Sij is present because the sample size within a study arm will affect 
the distribution of the outcome summary statistics. 
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(a) 


(b) 


Figure 1: a) The study-level DAG reflecting the unconfoundedness and time¬ 
ordering assumptions made in Sections 4^ and without assuming indepen¬ 
dence between the sample mean and standard deviation within a study arm. b) The 
simplified DAG that arises from assuming the independence between the sample 
mean and standard deviation. Here, W 1/, W2i and W3i are baseline covariates, Aij 
and Aj\^j are the treatments assigned to arms j and the non -j arm(s), respectively. 


Nij is the sample size of arm j, Yi 


ij is the mean outcome and Sij is the estimated 


standard error of the outcome of arm j. 


The sample size node Nij is determined by the sample size calculation made in 
the study design phase and also by the success of recruitment. This calculation is 
inherently conditional on the superpopulation being evaluated, as superpopulation 
characteristics are taken into account when hypothesizing an effect size and stan¬ 
dard error. This calculation is also conditional on the treatments being compared. 

Causal DAGs can be used as a tool to identify which variables must be con¬ 
trolled for in the meta-analysis in order to estimate the treatment-specific metapop¬ 
ulation mean outcome. Depending on some underlying statistical assumptions 
that we will investigate in detail in the following sections, these DAGs may sim¬ 
plify to Figure |l(b)[ This happens because we can ignore the mediation path 


through Nij in order to estimate the total effect of the treatment on outcome. Under 
these conditions, assuming independence between the variables in Wi, the analy¬ 
sis must adjust for all common causes of treatment selection and study outcome 
distribution. 

Note that the recommendations based on this DAG differ from those of IJansenI 


et al. (2012), who say that the analysis must adjust exclusively for effect modifiers. 


The assumptions that we list in Section 4.3 are explicitly required in the steps we 
take in Section[43]in order to obtain identifiability of the meta-analysis parameter 
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of interest. 


4.2 The G-formula and nonparametric identifiability 


Suppose we observe the aggregate data Oi,i = independently drawn 

and identieally distributed. Using the nonparametrie struetural equation modeling 
(NPSEM) of|Pearl|(|2009|), the metapopulation mean outeome, M", ean be shown 


to be identifiable (that is, known with infinite data) under the several eonditions 
outlined and diseussed in Seetion lTSl 


4.2.1 The observed data generation 


At the study design stage for RCT i, the superpopulation Pj is randomly drawn 
from the metapopulation The seleetion of P, determines the population-level 
eovariates Wi. The number of study arms n, and the treatments eompared in the 
study, the multivariate A, = {AijJ — are drawn eonditional on W,. The 

sample size ealeulation is earned out based on the ehoiee of treatment eompari- 
son and on the sub-population eharaeteristies (i.e. based on the expeeted effeet and 
preeision in that sub-population). This ealeulation is approximate and the result¬ 
ing sample size also depends on the sueeess of reeruitment. Therefore, the sample 
sizes for the treatment arms, Ni, are not deterministie, but are drawn eonditional 
on Ai, rii and Wi. 

The seeond stage operates at the individual level onee subjeets are reeruited 
and randomly assigned treatment. Suppose eaeh subjeet k in arm j of study i has 
eontinuous outeome Yijk,k = 1,..., (under treatment A,y). Eaeh Yij^ is indepen¬ 
dently drawn from a distribution with mean and standard deviation E^'^ . The 
empirieal mean outeome in arm j of study i is therefore Ytj = 1 /NijY^kYijk- The 
standard deviation is estimated as 5?- = 1 /{Nij — 1) ~ In addition, 

subjeet reeruitment yields summary eharaeteristies of the superpopulation, whieh 
we assume to inelude eomplete information about the eovariates Wi that were 
known at study eoneeption and eontributed to the treatment ehoiee. We assume in 
the following that we do not observe the subjeet-level data. 

Eet cOij represent the set of estimated summary statisties of the outeome vari¬ 
able from study i arm j. Eor instanee, we might have that COij = {Yij^Sij}. Corre¬ 
spondingly, let co^j be the set of estimates of the eounterfaetual summary statisties 
that would arise had arm j been assigned treatment a. 
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Assuming no interference between arms and that the distribution of (Oij in one 
arm of a study is conditionally independent of the outcomes in the others and also 
independent of the total number of arms, the NPSEM that we assume can then be 
written as 

= fwi^w) 

— fn(Wi, £n) 

Ai = fA{ni,Wi,£A) 

Ni = fN{Ai,ni,Wi,£M), for j = 1 , 

Oiij — f(i){J^ii-iAij^Wi^£(i)), for j = 1 ,...,^;' 


The probability density function f{Oi) arising from the NPSEM without in¬ 
tervention can be decomposed as 

fm =QwmQn{.n^ I Wi)gA{Ai I ni,Wi)x 

rii 

QNiNi I Ai.rii.Wi) Qo)i0)ij I Nij,Aij,Wi) 

;=i 

where Qw{Wi) corresponds to the density function for Wi, Qn{ni \ Wi) corresponds 
to the density function for rii conditional on Wi, and gAiAi \ ni,Wi) corresponds to 
the conditional density function for A/. Within each RCT, Q^iNi \ Aij,Ai\j,ni,Wi) 
corresponds to the conditional density function for A, and QcoicOjj \ Nij,Aij,Wi) is 
the conditional (joint) density for the measured summary statistic(s) in arm j. 


4.2.2 The counterfactual distribution 


Define an intervention as the assignment of treatment strategy a to an arbitrary 
arm in each study. In other words, for all i we set Aij = a for a single arbitrary 
arm j. The remaining non- j arms receive potential treatments The joint 
density for the counterfactual data 0“ = (W/, n,-, A®^^., { (ofj *, A?*; y* = 1 ,...,) can 
be obtained through the G-formula (Robins, 1986[ ). This joint density function can 
be written as 

/(Of) =QwmQn{ni I Wi)gA\j{Al. I ni,Wi)QN{Nt\A^^.,ni,Wi)Qo,{(otj \ Ntj,Wi)x 


Nf^*,A‘lj.,Wi)QN{Nfj. \Al.,n„Wi) 


n 

where gA\j{Af^j \ ni,Wi) is defined as the conditional (joint) density of the treat¬ 
ments assigned to non- j arms. 
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4.2.3 Identifiability for conditionally independent Y and S 


Suppose we have that cOij = {Yij^Sij}, meaning that eaeh study reported the sam¬ 
ple means and sample standard deviations of a eontinuous outeome. We then make 
the key struetural assumption that | Nfj, Wi where Nfj is the eounterfaetual 

sample size in study i arm j. Let 7-^ represent an individual reeruited into study 


i arm j in the eounterfaetual seenario. The independenee assumption arises natu¬ 
rally from the distributional assumption that Y^-^. ~ 


beeause Yf- and 

5" are the sample mean and standard deviation in superpopulation Pi when a is 
the treatment assigned. Asymptotieally, we have that 7^“ and Sfj are independent 
normal variables when the subjeet-level o uteomes are draw n from a distribution 
with zero skew, sueh that E{{Y‘^jjJ^} = 0 (Ferguson 1996 p. 46). We show in 


Appendix I A. 1 [ that under this assumption f(0“) ean be deeomposed in sueh a way 
that the mean outeome, E{Y^j) = M^, ean be written independently of the non -j 
arms, resulting in the simple equality = j^E{Y^- \ Nf,,Wi)Qw{Wi)dW. Under 


the uneonfoundedness assumption 7-‘y_LLA,-/- = a \ Nfj, Wi, and under the eonsis- 


teney assumption (see next seetion) we may write the G-formula (Robins 1986) 
= Jyy,E(Yij I Wi,Aij = a)Qw{Wi)dWi. Therefore, this quantity is identifiable 


from the data. 

Identifiability without assuming this struetural independenee is possible, and 
we deseribe the additional eausal assumptions required for this setting in Ap¬ 
pendix |A^ 


4.2.4 Identifiability for binary outcomes 

If the original study outeomes are binary (sueh that 7,^^ = {0,1}), the study means 
Yij are the proportions of subjeets with the indieated outeome. Therefore, NfjY^^j 
has a binomial distribution with true probability of outeome Mf = E{Y‘^ \ Pi). 
Then, Lf = ^,JVar{Y^ \ Pi) = — Mf). Similarly, the study arm estimate 

of the standard deviation is — f)"). In this ease, the likelihood will 

not inelude a eomponent for Sij so no independenee assumption is neeessary. The 
resulting G-formula is still = f^,.E(Yij | Nij,Wi,Aij = a)QwiWi)dWi and will 
rely on the same uneonfoundedness assumption that 7-"_LLA;j = a \ Nij,Wi. 
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4.3 Assumptions 


For convenience, here we list the assumptions needed for the identification of 
M“, corresponding with the NPSEM in Section [4.2. 1| and the DAGs in Figure 
We also comment on the meaning and plausibility of these assumptions in the 
hypothetical situation where each individual RCT has full compliance. Under full 
compliance, each RCT arm produces a consistent estimate of the mean outcome 
in the superpopulation under full adherence to the assigned treatment. 

No interference. The use of the above counterfactual notation presupposes 
that the treatment assigned to one study does not affect the counterfactual out¬ 
come of another study ( Rubin[ 19801. A secondary level of interference within an 
individual study involves the treatment in one study arm affecting the outcomes 
in another study arm. This means that the estimates Yfj and do not depend on 
the treatment received by another arm of the same RCT. The assumption of no 
interference will generally not hold for certain studies of infectious disease. For 
example, an effective vaccine in one arm may impact the outcome of an unvac¬ 
cinated subject in the control arm, because the unvaccinated subject will be less 
likely to be exposed to the disease through herd immunity. 

Unconfoundedness. (Weak) unconfoundedness ( |Imbens[|2000[ ) is required for 
the identification of A/'*. In this context, unconfoundedness is the assumption that 
the counterfactual sample means under a treatment a are independent of the true 
treatment received conditional on measured covariates. Specifically, this means 


that YfjllAij = a 


NfpWi- In the example DAG of Figure |l(a)| this corresponds 


to measuring all the components of node VF2,. The validity of this assumption is 
entirely dependent on the subject-matter, how RCTs in the field are designed, and 
on the information reported in the RCTs. 

Consistency. The consistency assumption in this context states that the coun¬ 
terfactual mean of a study arm under a given treatment is the same as the observed 
result. With notation, this is equivalent to stating that Yfj = Yij when Aij = a. 
Having different definitions of treatment across studies may violate this assump¬ 
tion if all are categorized under the same treatment type and this variation has 
an impact on the outcome (Cole and Frangakis, 20091. For example, there may 
be different drug dosages and lengths of follow-up across studies. Disregarding 
these differences will violate consistency if the various treatment-types have dif¬ 
ferential effects on the patient outcomes. With some additional unconfoundedness 
requirements, one might surmount this obstacle using the approach described in 
VanderWeele and Heman (|2013|). (Note that this definition of consistency corre¬ 


sponds with the causal assumption and is distinct from the network meta-analysis 
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meaning of the term in e.g. Lu and Ades|2006 ) 

Positivity. Finally, we need to evaluate both theoretieal and praetieal positiv¬ 
ity. Theoretieal positivity is the assumption that, conditional only on variables 
required for unconfoundedness, all studies had a positive probability of being as¬ 
signed eaeh treatment under investigation. Praetieal positivity is the eondition that 
for every level of the eharaeteristies Wi, there is an estimated positive probability 
of reeeiving treatment. 

It is important to note that treatment eomparisons are based on the same 
and that the target parameter M" = E (Mf) relies on the definition of this metapop¬ 
ulation. If positivity does not hold on some subpopulations it would be neeessary 
to exelude all studies (and eorresponding superpopulations) that eontain sueh sub¬ 
populations. 

It is furthermore important to note that the positivity assumption is not the 
same as requiring that all studies eould have realistieally been assigned eaeh treat¬ 
ment. In partieular, eertain treatments may not have been available when some 
older trials were earned out. If year of study is not required to uneonfound the 
analysis, then the unconditional probability may still be non-zero. 


5 Estimation of the treatment-specific metapopula¬ 
tion mean outcome 


5.1 G-Computation 


G-Computation proeedures based on the G-formula in Seetion 4^ean be used to 
estimate the target parameter. Here we define a simple proeedure resulting from 
the data requirement that the sample mean and standard deviation are independent 
within a study arm. This proeedure allows for simple frequentist estimation of the 
mean effeet of treatment. 

This proeedure requires estimates for the eonditional expeetation E {Yij \ Wi,Nij,Aij 
a) for a given value of treatment. First we must note that while the eonditional 
mean of Yij is independent of Nij, its distribution is not. In partieular, we have that 


VariYij I Wi,Nij,A,j) = ^VariY.jk \ Wi,Nij,A,j) = 

■'''o' ■'''o' 


Aij \ 2 


Beeause 5? is a eonsistent estimate of the superpopulation-level varianee under 
treatment A,y, we are able to estimate this varianee. 
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A model for the regression on Ytj may be fit by pooling over all arms regardless 
of treatment assignment. In order to obtain the Best Linear Unbiased Estimator, 
we ean weight by Nij/Sfj. Using this model fit, we prediet Y‘^ = E{Yij \ Wi^Ajj = 
a), i.e. the predieted mean under treatment a for eaeh study. The G-Computation 
estimate is then = 1 Yf. 

The standard error for the G-Computation estimate is usually computed through 
nonparametric bootstrap methods ( [Snowden et al.[ [20TT] ). Bootstrap resampling 
must be done by resampling studies, rather than arms, similar to what is done in a 


study with clustering (Efron and Tibshirani, 19941. 


5.2 Inverse probability of treatment weighting 

Eikelihood methods, such as G-Computation, require correct parametric specifi¬ 
cation of the outcome model, which may be difficult to specify. An alternative 
approach is to utilize propensity score methods, which require the estimation of 
a model for the treatment received by the arm. Eor a given treatment type a, let 
gaiWi) be an estimate of the probability P{a G A, | Wi), called the generalized 
propensity score ( |Imbens[[2000| ). 

Despite the small sample size in standard network meta-analysis, one might 
attempt inverse probability of treatment weighting (IPTW) for the estimation of 
the marginal parameter. Eet T" represent the observed outcome of the arm of 
study i that received treatment a (or N/A if no arm of study i received treatment 
a). An IPTW estimator for multiple treatments ( Imbens[ 20001 can be represented 
as 

^ I(a G 

Mjptw = 1/^52 


1=1 


ga{W,) 


Intuitively, this estimator takes a mean of Yij with only the arms treated according 
to Aij = a. It then adjusts this estimate to remove the confounding bias caused by 
the baseline variables. 

The consistency of this estimator can be shown as follows. 


^SIPTW ^ ^ 


y“I(aGA,) 

_P{aeAi\W,) 


= E{YrE 


I(£i[ G Ai 


_P{aeA,\Wi) 




= E {Y^) = 


5.3 Targeted Minimum Loss-based Estimation 


Targeted Minimum Eoss-based Estimation (TMEE) (van der Eaan and Rubin 


2006 van der Eaan and Rose 20111 is a framework for the construction of semi- 
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parametric estimators generally applied to the estimation of causal quantities. The 
TMLE procedure is carried out by first fitting a model for the expected value of 
the arm-based means, E{Yij \ Wi.Aij = a) which, under the causal assumptions, 
can equivalently be written as the expectation of the potential outcome had the 
study evaluated treatment a, E{Yf | G A,). As in the G-Computation proce¬ 
dure, this model can be estimated by weighing each observation by Nij/Sjj. For 


each arm in the study, we use this model to obtain F®, predictions of the sample 
mean of each trial i under treatment a. These predictions are then updated by 
fitting a no-intercept logistic regression using study arms that evaluated treatment 
a. This logistic regression is fit with outcome Yjj, offset logit{Y°), and single 
covariate corresponding with the inverse probability weights. Denote 

the estimate of the coefficient from this regression as £. The updated predictions 
are then logit{Y^'*) = logit{Y^) + £/ga(W/), which is calculated for each study. 
The final targeted estimate for is = 1 /NY!i=i *■ Note that in order 

to perform the update step, the means and outcome must be transformed to (0,1) 


and then subsequently transformed back to the original scale (Gruber and van der 


Laan 2010). This can be done using real or empirical bounds. 


This TMLE is consistent under correct specification of the propensity score 
model or the model for the expected value of the mean outcome (the property of 
double robustness). If both of these models are correct, then TMLE is asymp¬ 
totically efficient in the class of regular, asymptotically linear estimators in the 
semiparametric model space ( van der Laan and Rose[ 20111. More details and a 
proof of consistency are included in Appendix] A.3 [ 


6 Simulation study 

In this section we demonstrate that we can obtain consistent estimation of the 
target parameter = E{Y‘‘) under the NPSEM using the proposed estimators. 
We also compare the efficiency of each approach. 

While the proposed estimators do not restrict the number of study arms, we 
fix all simulated studies to have exactly two treatment arms for simplicity. We are 
interested in estimating the mean outcome of the metapopulation under treatment 
for each of four treatments of interest. For each study i = 1,...,A, we generate 
the population average characteristic, W) from a Poisson distribution with mean 2. 
The probabilities of receiving a given treatment are calculated conditional on the 
value of Wi- Two treatment options A,- are then sampled without replacement using 
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Table 1: Simulation study: data generation 


Variable 

Study design: for each i= 1,..., V 


Number of arms 
Study-level covariate 

Treatments 

Sample size 
(study recruitement) 

tii = 1 

Wi ~ Poisson{ii = 2) 

M sampled without replacement with probabilities 

Pi = logit^^{0.4Wi) 

P 2 = logir^{-0.4Wi) 

P3 = logir^{0.SWi) 

P4 = logir^{-0.SWi) 

Ni ~ Poisson{ii = 500Oexp{—0.AW — sum{y[Aij\))) 
where 7 = (—1.5,1,-1,1) 


Within-study: for each j = 1,2, k = 1, 

....,Ni 

Subject-level covariate 

Subject-level outcome 

Xiji,^N{p = W,,a^ = A) 
Yijk^N{p=X,jk + ^[Aij\,o^ = \) 

where j 8 = (0.8,0.2,1, —0.5) 



Observed data: for each i= 1,..., V, y 

= 1,2 

Study-level information 

Wi, Ai, and ?,■; where 



the ealeulated probabilities. Treatments 2 and 4 are generated to be less likely to 
be ehosen with larger W, . The sample size Ni (whieh we allowed to be eommon 
to both arms in the study) is drawn from a Poisson distribution with mean linear 
in Wi and A,. For eaeh subjeet within each arm, we draw a baseline covariate 
Xijk from a Gaussian distribution with mean Wi and constant variance. We set 
/3 = (0.8,0.2,1, —0.05) to be the treatment-specific coefficients. Outcome values 
Yjj]^ are drawn from a Gaussian: Yiji^ ~ -\- /3 [A/j], 1). A summary of the 

data-generation is presented in Table 

The sample statistics from each study arm are calculated by taking the mean 
and standard deviation of '^ijk within each arm. The true treatment-specific super¬ 
population means areM^ = 2.80,M^ = 1.10 ,= 3.00,= 1.95. We are inter- 
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ested in estimating a subset of the eontrasts between the treatments, speeifieally 
marginal mean differenees M'^ —M^ — —0.60, —M^ = 0.20, and —M^ = 

—0.85. Note that random effeets were not generated in this simple simulation 


study. 

We tested the three methods deseribed in the text (G-Computation, IPTW and 
TMLE) for = 15 and 50 simulated studies. We used logistie regression mod¬ 
els eonditional on the eovariate for the generalized propensity seore for IPTW 
and TMLE. We ran two seenarios: ineorreet and eorreet outeome model speei- 
fieation. Eor the eorreet seenario, linear regression models for the outeome ad¬ 
justing for treatment type and eovariate were used in G-Computation and TMLE. 
Eor the ineorreet seenario, the outeome was sealed to (0,1) and logistie regres¬ 
sion models were used. We also display results for an unadjusted estimator that 
merely takes the mean differenee in treatment-speeifie outeomes when available. 
Varianee and eonfidenee intervals were estimated using the nonparametrie eluster 
bootstrap (Efron and Tibshirani 19941 where study is eonsidered the eluster (and 
arms are the individual observations). In Tablewe present statisties deseribing 
the quality of the estimation of all eontrasts with treatment 1. These statisties are 
the pereent finite sample bias (“% Bias”), the standard deviation of the estimates 
over the simulated data (“SE-MC”), the bootstrap-estimated standard error (“SE¬ 
ES”), and the pereentage of the 95% eonfidenee intervals that eapture the true 
effeet size (“% Cov”). Bootstrap resamples that did not allow for an estimate of 
the eontrast (i.e. if either of the treatments did not appear in the resampled data 
set) were disearded, potentially biasing this standard error estimate. 

The unadjusted estimator was greatly biased for the first and third eontrasts, in- 
dieating that those two eontrasts were highly eonfounded by the simulated study- 
level eovariate. The eorreetly speeified G-Computation estimator had the lowest 
bias throughout, the smallest standard errors, and near optimal eonfidenee inter¬ 
val eoverage. This is to be expeeted as G-Computation is a funetion of maximum 
likelihood parameter estimates with eorreet parametrie speeifieation of the neees- 
sary eomponent of the likelihood (namely, the eonditional mean of the outeome). 
However, with an ineorreetly speeified outeome model, the estimator was biased 
whieh eaused the eoverage to suffer for the third eontrast. 

IPTW was the most biased estimator and also had the largest varianee. The 
bias largely dissipated when the sample size was inereased to = 50 studies. 
IPTW had good eoverage exeept for the third treatment eontrast where treatment 
4 was rare. The slower eonvergenee of IPTW in the eontrast involving treatment 4 
ean be explained by a higher varianee of the estimated weights for that treatment 
eompared to the others. The performanee of IPTW has previously been seen to 
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suffer when data support for eertain exposure levels is sparse (i.e. under near 
praetieal positivity violations) ( |Gruber and van der Laan| 20101. Truneation of 
the propensity seore at 5% and 10% respeetively (that is, replaeing the bottom p% 
of the propensity seore with the pth pereentile) ( Cole and Heman[ 20081 inereases 
the bias for the first and third eontrasts while redueing the varianee, with no effeet 
on the eoverage (results not shown). 

TMLE with eorreet outeome model speeifieation had bias eomparable to G- 
Computation but slightly higher for = 15. For N = 50, the standard error 
of TMLE was eomparable to that of G-Computation but for = 15 it was up 
to 80 times larger. Regardless, eorreetly speeilied TMLE had good eoverage 
throughout. Notably, the bootstrap standard error estimates were eomparable to 
the Monte-Carlo standard error for A^ = 50 but diverged for IPTW and TMLE 
when A^ = 15. Certain implementations of TMLE are more sensitive to near prae¬ 


tieal positivity violations (Gruber and van der Laan 2010 Porter et al. 2011 


Sehnitzer et al. 20131, henee the need for the robust version that involves the 


logistie regression for the update of the predietions (as deseribed in Gruber and 


van der Laan|2010| and for our speeifie setting in Seetion [53] ). When the outeome 


model was misspeeified, TMLE also aeerued bias for the first and third eontrasts, 
with magnitude eomparable to the misspeeified G-Computation. This bias de- 
ereased with more studies due to the double robustness of TMLE (making this 
estimator eonsistent even when the outeome model is misspeeified). Coverage 
only suffered for the third eontrast whieh was the most biased. 


7 Application: Antibiotic use on methicillin-resistant 
Staphylococcus aureus infection 

We illustrate this eausal inferenee approaeh and the adapted estimation methods 
in network meta-analysis with an example from infeetious disease researeh. An 
inerease in MRSA has spurred investigation of eomparative effieaey of different 
antibiotie treatment options. While the antibiotie vaneomyein has been the stan¬ 
dard treatment for deeades, treatment failures have been noted in patients with 


serious infeetions (Liu et al. 20111. Interest therefore lies in whether alternative 


antibioties are as effeetive as the standard. Bally et al. (2012) performed a system- 


atie review and Bayesian network meta-analyses of RCTs of parenteral antibioties 
used for treating hospitalized adults with eomplieated skin and soft-tissue infee¬ 
tions (eSSTls) and hospital-aequired or ventilator-assoeiated pneumonia. 
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Table 2: Simulation: Quality of treatment eontrast estimation with a Gaussian 
outeome (two-arm studies, 1000 simulated datasets). 




% Bias 

SE-MC 

SE-BS(% Cov) 


N 

15 50 

15 50 

15 50 


Correctly specified models 




G-Comp 

0 

0 

0.04 

0.02 

0.04(91) 

0.02(94) 

II 

1 

-0.6 

IPTW 

40 

9 

0.57 

0.46 

0.61(89) 

0.41(92) 

TMLE 

4 

0 

0.27 

0.03 

0.44(96) 

0.05(92) 



G-Comp 

1 

0 

0.04 

0.02 

0.04(91) 

0.02(95) 

II 

1 

0.20 

IPTW 

-2 

0 

0.10 

0.04 

0.25(99) 

0.05(97) 

TMLE 

-1 

-1 

0.17 

0.04 

0.29(96) 

0.04(93) 



G-Comp 

0 

0 

0.04 

0.02 

0.04(89) 

0.02(93) 

1 

II 

1 

-0.85 

IPTW 

81 

3 

0.76 

0.74 

0.62(63) 

0.80(68) 

TMLE 

-9 

0 

0.81 

0.11 

0.74(94) 

0.21(95) 


Misspecified outcome model 




No adjustment 

101 

103 

0.65 

0.35 

0.61(75) 

0.34(52) 

II 

1 

n 

-0.6 

G-Comp 

2 

12 

0.20 

0.13 

0.24(98) 

0.11(94) 

TMLE 

-8 

-2 

0.33 

0.09 

0.46(97) 

0.11(96) 



No adjustment 

5 

-7 

0.37 

0.20 

0.38(92) 

0.20(93) 

II 

1 

0.20 

G-Comp 

-1 

7 

0.20 

0.12 

0.18(99) 

0.11(95) 

TMLE 

0 

0 

0.15 

0.05 

0.28(99) 

0.05(96) 



No adjustment 

126 

125 

0.69 

0.38 

0.61(51) 

0.36(18) 

1 

II 

1 

-0.85 

G-Comp 

36 

33 

0.53 

0.29 

0.48(88) 

0.24(80) 

TMLE 

44 

-24 

0.86 

0.38 

0.75(87) 

0.36(75) 


We eonsider the target population of interest to be the population of elini- 
eal trial participants with suspected or confirmed MRSA cSSTIs or pneumonia, 
with corresponding studies published until May 2012. The site of infection and 
confirmation of MRSA represent important differences in the entrance criteria of 
the various studies. 24 studies were found. Patients were randomized based on 
suspicion of MRSA in all but three studies for which the protocol specified con¬ 
firmation of presence of MRSA at baseline. 14 studies enrolled subjects with 
cSSTIs, 7 studies enrolled subjects with hospital-acquired or ventilator-associated 
pneumonia, and 3 studies allowed for either indication. The original network 
meta-analysis of |Bally et al.| ( |201^ analyzed each infection site in separate analy¬ 
ses and therefore obtained stratified estimates. Based on the theory we developed, 
we can account for the potentially different treatment effects in each subpopula¬ 
tion by controlling for subpopulation type as a covariate in the analysis. By doing 
so, we ask a higher-level yet still clinically interesting question: “Are the alterna- 
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tive therapies as eifeetive as the standard antibiotie for the treatment of suspeeted 
or eonfirmed MRSA?” Beeause infeetion site, MRSA eonfirmation, and study 
year ean potentially affeet the ehoiee of investigated therapies and the outeomes, 
these three eovariates (labeled W,) should be adjusted for in order to minimize 
eonfounding bias. 

The outeome of interest is elinieal test of eure for all subjeets who reeeived 
at least one dose of treatment (a standard measure in infeetious disease researeh). 
Four papers evaluated the outeome only on a subset of patients seleeted post¬ 
randomization; as this does not eonform to our definition of the RCT-speeifie pa¬ 
rameter of interest, we eonsidered these outeomes missing. For our analysis, we 
ehose to eompare vaneomyein with the two most prevalent alternatives: telavanein 
and linezolid. In total, 47 study arms evaluated one of these three treatments and 
36 had an observed outeome. Of the remaining treatments, tigeeyeline, dapto- 
myein, and eeftaroline were eaeh evaluated in three study arms, and a regime 
of quinupristin/dalfopristin was evaluated in one arm. All of this information is 
available in the data extraetion Table |3l 

We ran four methods to obtain estimates of the eounterfaetual relative risk of 
both eontrasts with the eomparator vaneomyein. The methods are 1) a ratio of 
the unadjusted mean outeomes using all available arms (ealled “No Adjust”), 2) a 
random effeets regression for the arm-speeifie study outeomes using a log-link and 
a study-speeifie intereept (“RE Arm”), 3) G-Computation where a random effeets 
logistie regression weighted by the inverse standard errors is used to prediet the 
eonditional mean outeomes, and 4) TMLE with a weighted logistie random effeets 
model for the outeome and LASSO-penalized logistie regressions (to handle the 
sparse data) for the propensity seore and a missing data model using the R library 
glmnet ( [Friedman et al.[[2010[ ). IPTW behaved erratieally and was not ineluded 
in this example. The missing outeomes required that the TMLE algorithm inelude 
fitting a model to estimate the probability of a missing outeome in eaeh study; 
the TMLE update step was therefore modified to use a produet of the propensity 
seore and the probability of observing the outeome in plaee of gaiWi)- To estimate 
the standard errors and eonfidenee intervals, the built-in funetions in the library 
lme4 were used for the random effeets model, and the elustered nonparametrie 
bootstrap (1000 times 54 resamples of 27 studies with replaeement) was used for 
the other methods. 

The results of the network meta-analysis are presented graphieally in Eigurej^ 
(and numerieally in the Appendix Eigure|^. We also ineluded the results of the 
studies that eontrasted the two treatments direetly. Eor the eomparison of tela¬ 
vanein versus vaneomyein, all estimators inelude the null in the eonfidenee in- 
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terval. The random effect regression and G-Computation produce estimates of 
the relative risk close to one, indicating near equivalence of treatments while the 
point estimate of TMLE was further from the null (in the direction of the superi¬ 
ority of vancomycin). Notably, the confidence interval for the TMLE in the first 
contrast is much wider than the others. The unadjusted method produced a point 
estimate in the direction of the superiority of telavancin, demonstrating that the 
correction for study-level confounding impacted the analysis. Eor the comparison 
of linezolid versus vancomycin, the random effects regression, G-Computation 


and TMLE agree on the superiority of linezolid. The original study by Bally et al. 


(2012) also found some suggestion of a superior effect of linezolid compared to 
vancomycin but for both subpopulations the confidence intervals were large and 
spanned the null. 


Direct and NMA contrasts between telavancin and vancomycin 


Direct and NMA contrasts between linezolid and vancomycin 


Stryjewsl<i et al, 2008 - 
Stryjewski et al, 2008 

StryjewskI 2006 - 
Rubenstein 2011 - 
Rubenstein 2011 - 

No Adjust - 
RE Arm - 
G-Comp - 
TMLE - 







t 


1-1-f-1-1 

0.8 0.9 1.0 1.1 1.2 


Weigelt et al, 2005 - 
Stevens et al, 2002 - 
Stevens et al, 2002 - 
Wunderink et al, 2003 - 
Rubenstein et al, 2001 - 
Itanietal, 2010 - 
Wunderink et al, 2012 


No adjust - 
RE Arm - 
G-Comp - 
TMLE - 

0.6 



1.4 


(a) (b) 

Eigure 2: Risk ratio estimates and confidence intervals for clinical success at test 
of cure for all studies with direct comparisons and all network meta-analysis meth¬ 
ods for the contrasts between a) telavancin and vancomycin and b) linezolid and 
vancomycin. Risk ratio values below one indicate superiority of vancomycin. 

We can also easily obtain estimates of the contrast between telavancin and 
linezolid. The G-Computation and TMLE produce risk ratios for clinical success 
of 0.94 (95% confidence interval = 0.92,0.94) and 0.85 (0.71,1.12) respectively, 
with G-Computation concluding the superiority of linezolid. As no RCT directly 
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contrasted these two antibiotics, this demonstrates another general advantage of 
network meta-analysis, which is the ability to formally compare treatments using 
only indirect evidence of their relative performance. 

If we are to interpret the summary statistics as estimates of the relative causal 
effects of antibiotic choice on successful treatment, the causal assumptions in Sec¬ 
tion]?^ need to be satisfied. Each of the studies evaluated the clinical efficacy of 
the treatments, which is defined on patients who had received at least one dose 
of the study drug. Because randomized treatment was first-line therapy (admin¬ 
istered intravenously in-hospital) and the success of treatment was determined 
clinically, each trial estimated the relative effect under full adherence. No inter¬ 
ference: No interference is credible in this case because all subjects were already 
suspected or confirmed to have MRSA upon entry to the study. Therefore, the 
choice of treatment in the other arm wouldn’t have an effect on existing infec¬ 
tions nor the success of treatment. Unconfoundedness: The unconfoundedness 
assumption relies on whether year, infection type, and whether MRSA was con¬ 
firmed were sufficient to control for confounding at the study-level. This assump¬ 
tion could be violated if prognostic demographic variables were involved in the 
study design stage. However, prognostic markers such as diabetes and peripheral 
vascular disease (for cSSTI) and mechanical ventilation, APACHE II score, clin¬ 
ical markers of severity, and presence of organ dysfunction (for pneumonia) are 
unlikely to determine the choice of initial therapy (Eipsky et al.[ 2011, [NTeder- 


man 


2010). Consistency: The dosage regimens varied somewhat across studies 


but were all considered to be at therapeutic levels. However, the length of time 
to the evaluation time point for each treatment type varied within and between 
studies (e.g. 7-14 days for telavancin versus 12-28 days for linezolid). If this 
corresponds to meaningfully different treatment durations (and/or periods of time 
lapsed before evaluation), this would indicate different definitions of interventions 
across studies, and thus a violation of the consistency assumption. Positivity: All 
subjects in the study were indicated to receive any of the treatments evaluated. 


8 Summary 

In this paper, we nonparametrically define the parameter of interest in a net¬ 
work meta-analysis with direct and indirect comparisons using the counterfactual 
framework often employed in causal inference. This definition of the parame¬ 
ter of interest is model-independent and is interpretable on what we define as a 
metapopulation, the union of all superpopulations. Such an approach allows for a 
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straight-forward description of what is being estimated, which is accessible even 
without an understanding of the estimation methods being used. In particular, 
we can interpret the marginal effects defined in this paper as the relative mean 
outcome had all subjects in the metapopulation been assigned to each treatment 
versus another. If a specific population is of interest and not represented by the 
metapopulation, with some conditions it may also be possible to more generally 
transport effect estimation, as described by Bareinboim and Pearl (2013). 

We have presented a set of conditions under which identifiability of the pa¬ 
rameter of interest is possible. Identifiability allows for a clear description of 
when the parameter of interest can and cannot be estimated. For instance, the 
non-interference requirement casts doubt on the synthesis of studies that allow for 
treatment switching, crossover, or group contamination. The assumptions that we 
made allowed for the simplification of the relevant components of the observed 
data likelihood so that arm-based inference is possible. 

One might alternatively specify the RCT-estimated contrast as the “outcome 
of interest” (rather than use the arm-specific outcome as we did). However, un¬ 
der this alternative, the propensity score would then be defined as the probability 
of a trial directly contrasting a given treatment pair. For standard network meta¬ 
analysis sample sizes, this would most often produce practical positivity problems, 
indicating the need for extrapolation using the outcome model (and thereby cre¬ 
ating estimators that are very sensitive to model misspecification). In particular, 
two treatments that had never been directly compared would have no data support 
in this model. 

If all treatments are selected completely at random into studies (or if only 
two treatments have ever been available to compare) then a standard unadjusted 
analysis using those arms assigned the desired treatments would be consistent. If 
we weaken this assumption and replace it with conditional exchangeability, then 
the estimators introduced in this paper are appropriate in that they allow for the 
adjustment of study-level covariates. 

Our methods also allow for a wider inclusion criteria of studies in a systematic 
review. It is often the case that systematic reviews will exclude studies because 
they do not evaluate the exact desired clinical endpoint. Using our proposed meth¬ 
ods, we can avoid selection bias due to studies excluded only for this reason. To 
do so, we would artificially censor the outcomes of studies that do not estimate the 
desired outcome-type of interest. The censored outcomes of these studies might 
then be considered “missing at random” conditional on the study baseline infor¬ 
mation which should still be included in the analysis (both in the propensity score 
model and the missing data model). 
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For the analysis of continuous individual-level outcomes, we assumed inde¬ 
pendence between the sample mean and standard deviation within each study 
arm. While we chose to present our identifiability argument under this assump¬ 
tion, it is not ultimately necessary. However, it is not straight-forward to propose 
a valid Monte Carlo or Bayesian estimation approach to the setting with depen¬ 
dent sample means and standard deviations. In some cases, it may be possible to 
transform the individual-level data to remove the skew, but this relies on access to 
each study’s raw data, in which case an individual patient data analysis would be 
preferable. 

In the simulation study, we show that certain estimators adopted from the 
causal inference literature can produce valid estimates of effect contrasts under 
the identifiability conditions described. In particular, G-Computation and TMLE 
might lend themselves well to network meta-analysis, which is characterized by 
small sample sizes and low prevalence for certain treatments. IPTW was seen 
to be sensitive to rare treatment assignment and G-Computation and TMLE were 
seen to be somewhat sensitive to model misspecification. Some general benefits 
of using TMLE are that it is double robust and can incorporate nonparametric (or 
machine learning) estimation of the propensity score and outcome model which 
can help avoid bias from model misspecification ( van der Laan and Rose[ 201 1| ). 
More methods development and investigations are needed to address extremely 
rare treatments and how (or whether) TMLE can be adapted to be robust in this 
setting. 

The application we presented compared the results of random effects regres¬ 
sion, G-Computation, and TMEE in a network meta-analysis of the relative effi¬ 
cacy of treatment options for MRSA infection. The random effects regression and 
G-Computation produced small confidence intervals relative to the direct contrasts 
of the individual RCTs though TMEE only did for one comparison investigated. In 
contrast to the analysis in the original article that used unadjusted contrast-based 
hierarchical Bayesian modeling on the separate subpopulations of infection types, 
our analyses concluded that there is evidence to support the superiority of line- 
zolid over vancomycin. We also noted the poor stability of IPTW in this example 
and generally do not recommend this estimator when the data support for cer¬ 
tain treatment levels is sparse. Einally, using this data example, we demonstrated 
how the causal assumptions should be listed and critiqued in order to stimulate 
discussion about the appropriateness of causal interpretations in specific contexts. 

The framework we present formally assumes that we are restricting our anal¬ 
yses to studies evaluating a common parameter-type. If there was only partial- 
adherence in the RCTs, our framework does not allow for the mixing of intent- 
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to-treat parameter estimates with adherenee-adjusted parameter estimates. (Esti¬ 
mation of the adherenee-adjusted parameters in RCTs is deseribed in|Hernan and 


Hernandez-Diaz|2012 ) The same restrietion applies to the results of observational 


studies if the parameter type estimated in the observational study is not the same as 
in the elinieal trials. Speeifieally, treatment adherenee and outeome need to be de¬ 
fined identieally aeross studies, and all studies whose endpoints are ineluded must 
estimate the same mean treatment-speeifie eounterfaetual outeome. Although it 
is eommon praetiee to inelude different parameter types in a meta-analysis, our 
formalization of the target parameter reveals that a eausal interpretation of the 
resulting effeet estimate may be quite ehallenging. 

In addition to the issues we deseribe, there are many other eoneerns about 
aggregating study results in various settings. For instanee, one might question 
the independenee between RCTs happening elose in time, or the systematie re¬ 
view inelusion eriteria. We believe our framework provides additional strueture to 
the ongoing diseussion about the validity of network meta-analysis and will help 
stimulate solutions to the remaining ehallenges. 
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A Appendix 

A.l Proof of identifiability under structural independence 

The joint counterfactual distribution can be decomposed as /(G“) = /i (G")/2(6>“) 
where /i(6>“) = QwmQyiYfj \ Nf^,Wi) and 

/2(C“) =Qn{ni I I ni,Wi)QN{Nfj \Al.,ni,Wi)Qs{ST \ Nfj,Wi)x 

n \Nfj.,AT^,Wi)QN{Nfj. \Al.,nuWi). 

j*7^j 
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Let be the set of possible treatments. The target of our analysis is the study 
arm eounterfaetual outeome under treatment a, or E{Y^j) = This mean ean be 
written as 


oo oo 


f'oo noo POO POO 


/I I 11 / 
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E{Yt, I NfpWi)f2iO‘^)QwiWi)dSyYtpdSfpdWi 


where the integral for W/ ean be a multiple integral, taken over the domain of 
potentially multivariate W/. Now we note that for identieally distributed and eon- 
ditionally independent draws 


Nf, 


£(?<• I Nfj,Wi) = E I «”■.«'<) = I 


k=l 


beeause we assume that the study size has no effeet on the individual-level out¬ 
eome. It follows that £’(?,“ I NfpWi) is eonditionally independent of Nfp The 
expression for then simplifies to jy^.E{Yij \ Nfj,Wi)Qw{Wi)dWi. In order for 
the eonditional expeetation to be estimable from the observed data, we require the 
uneonfoundedness assumption Y^-llAij = a \ Nfp Wi. With respeet to the example 
DAG in Figure [T(a^ this eorresponds to having measured all eomponents of VT2,. 
If this assumption holds in addition to the eonsistency of treatment for Ytj, we may 
write A/“ = j,^E{Yij \ Wi,Aij = a)Qw{Wi)dW to establish identifiability. 


A.2 Identifiability without assuming structural independence 

It may not be plausible to assume eonditional independenee between 7-^ and Sfj. In 
this ease, the relevant part of the distribution of the observed data eounterfaetuals 
is 

/3(0-) = QwmQnin, I W,)gA\jiA‘^j I m,W,)QNWj I A‘^j,n„W,)Qy^siYi%S‘^j \ Nfj,W,). 

The target parameter ean be estimated as a multiple integral over eaeh and 
each density component in / 3 (C)"). Identifiability in this case requires a list of 
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unconfoundedness assumptions: A^i^.llAij = a \ ni,Wi, Nfj \ Atj = a \ A‘i^j,ni,Wi, 


in the main 


and Ylj,Sfj±LAij = a \ NfpWi. Assuming the DAG in Figure 1(a) 
manuseript, this requires having measured all eomponents of VFl/, lV 2 ,-, and VF3,. 

It also requires the eonsisteney assumption for Nij, Yj and Sp. Under these 
assumptions, we ean rewrite the relevant density eomponent as 

/3(0") =Qw{Wi)Q„{ni I Wi)gA\j{Ai\j \Aij = a,ni,Wi)QN{Nij \Aij = a,Ai\j,ni,Wi)x 
QY,siYij,Sij I Aij = a,Nij,Wi). 

Sinee eaeh eomponent of this density is estimable from the data, we have identifi- 
ability of the target parameter in this ease as well. 


A.3 Efficiency and Consistency of TMLE 

The loeal semiparametrie effieieney and estimation eonsisteney of the TMLE we 
deseribe ean be derived very similarly to the standard observational data setting 
(with a single eategorieal exposure variable) for the estimation of the average 
treatment effeet ( van der Laan and Rose[[2011| ). To give more insight into how 
this extends to the network meta-analysis ease, we present some additional details 
and a proof of double robustness. 

The effieient influenee funetion for parameter of interest M" with only aggre¬ 
gate observed data is 


Dl/O) = {fij-Ei?" I W.-.a e A,)} +E{?‘ 


Wi,aeAi)-M‘^. 


Note that the TMLE update step produees values of Y^* that solve the empirieal 
effieient influenee funetion equation: 


N m 

E E 

i=\j=l 


I{a G Ai) 

sJW 


+ {Yij''-l<iTMLE)=0 


so that it follows that the TMLE is a loeally effieient estimator ( [van der Laan and 
Robins[ [2003[ |TsiatTs| |2006| ) . Speeifieally, the logistie regression update step with 
single eovariateX, = I{a &Ai)/ga{Wi) solves the seore equation E/Li ~ 

Yp*) = 0 and in the last TMLE step we set Mf^LE = '^j=\ ■ 

First suppose that for inereasing values of LjLiI(a G Ai), the generalized 
propensity seore ga(VF,) eonverges to some gaiWi) 7 ^ P{a G Ai \ Wi) but that Y^* 
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converges to the true values E{Yij \ Wi-, a G Ai). We then have that 

][(^/ G Ai 


{Yij-E{Yf\Wi,aeAi)} 
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+ E{Y[‘ I WuaeAi)-M^ 


= E 
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E {Yij-E{Y^ \Wi,ae Ai) \ Wi,a e Ai] x +E{Y,^ \ W^a G A,)-M^ 




Ox 


I(£i! G Ai 


+ 0 = 0 


gam 

Now suppose that gaiWi) converges to the true values P{a G A, | Wi) but that 
Y^A converges to some function QaiWi) ^ E{\ Yij \ Wi,a G A,). We then have that 

{Yij-Qam] X + 


= E 


{Yij-Qam]^E 


P{aeAi\Wi) 

I(a G A/) 


+Qam-M^ 


P{aeAi\Wi) 

= E [{Yij - QaiWi) } X 1 + QaiWi) -M“] 

= E[Yij-M‘'] =0 

Therefore, if either of the models for | W-,a G A/) or /’(a G A,- | Wi) are con¬ 
sistent, then the TMLE for is also consistent as the efficient influence function 
equation is consistent for 


A.4 Data extraction information and numerical results for the 
example of antibiotic use on methicillin-resistant Staphy¬ 
lococcus aureus infection 


Table [^presents the full study list from the systematic review of |Bally et al. (2012) 
and the data that we used in the analysis in Section |7j Table [^presents the numer¬ 
ical results that we obtained from our analyses, corresponding with Figure]^ The 
full reference list is below. 


References for the MRSA application 

Arbeit, R. D., D. Maki, F. R Tally, E. Campanaro, B. 1. Eisenstein, and Dapto- 
mycin 98-01 and 99-01 Investigators (2004): “The safety and efficacy of dap- 
tomycin for the treatment of complicated skin and skin-structure infections,” 
Clinical Infectious Diseases, 38, 1673-1681. 
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Table 3: Data extraction table for the network meta-analysis of antibiotic use on 
methicillin-resistant Staphylococcus aureus infection 


Publication 

Events 

Ni 

Ai 

StudylD 

Year 

Infection 

Confirmed MRSA 
at baseline 

Katz et al., 2008 

42 

48 

vancomycin 

1 

2007 

cSSTI 

0 


36 

48 

daptomycin 

1 

2007 

cSSTI 

0 

Arbeit et al., 2004 

162 

266 

vancomycin 

2 

2001 

cSSTI 

0 


165 

264 

daptomycin 

2 

2001 

cSSTI 

0 


235 

292 

vancomycin 

3 

2000 

cSSTI 

0 


217 

270 

daptomycin 

3 

2000 

cSSTI 

0 

Breedtet al.,2005 

216 

250 

vancomycin 

4 

2003 

cSSTI 

0 


212 

253 

tigecycline 

4 

2003 

cSSTI 

0 

Sacchidanand et al., 2005 

196 

255 

vancomycin 

5 

2003 

cSSTI 

0 


203 

268 

tigecycline 

5 

2003 

cSSTI 

0 

Stryjewski et al., 2008 

307 

429 

vancomycin 

6 

2006 

cSSTI 

0 


309 

426 

telavancin 

6 

2006 

cSSTI 

0 


360 

489 

vancomycin 

7 

2006 

cSSTI 

0 


348 

472 

telavancin 

7 

2006 

cSSTI 

0 

Stryjewski et al., 2006 

81 

95 

vancomycin 

8 

2004 

cSSTI 

0 


82 

100 

telavancin 

8 

2004 

cSSTI 

0 

Corey et al., 2010 

297 

347 

vancomycin 

9 

2007 

cSSTI 

0 


304 

351 

ceftaroline 

9 

2007 

cSSTI 

0 

Wilcox et al., 2010 

289 

338 

vancomycin 

10 

2007 

cSSTI 

0 


291 

342 

ceftaroline 

10 

2007 

cSSTI 

0 

Talbot et al., 2007 

26 

32 

vancomycin 

11 

2005 

cSSTI 

0 


59 

67 

ceftaroline 

11 

2005 

cSSTI 

0 

Weigelt et al., 2005 

402 

573 

vancomycin 

12 

2003 

cSSTI 

0 


439 

583 

linezolid 

12 

2003 

cSSTI 

0 

Stevens et al., 2002 

54 

87 

vancomycin 

13 

1999 

cSSTI 

0 


64 

99 

linezolid 

13 

1999 

cSSTI 

0 


16 

32 

vancomycin 

14 

1999 

pneumonia 

0 


20 

39 

linezolid 

14 

1999 

pneumonia 

0 

Wunderink et al., 2003 

128 

302 

vancomycin 

15 

2000 

pneumonia 

0 


135 

321 

linezolid 

15 

2000 

pneumonia 

0 

Rubenstein et al., 2001 

73 

192 

vancomycin 

16 

1999 

pneumonia 

0 


85 

203 

linezolid 

16 

1999 

pneumonia 

0 

Rubenstein et al., 2011 

221 

374 

vancomycin 

17 

2007 

pneumonia 

0 


214 

372 

telavancin 

17 

2007 

pneumonia 

0 


228 

380 

vancomycin 

18 

2007 

pneumonia 

0 


227 

377 

telavancin 

18 

2007 

pneumonia 

0 

Fagon et al., 2000 

67 

148 

vancomycin 

19 

1996 

pneumonia 

0 


65 

150 

quinupristin/ 

dalfopristin 

19 

1996 

pneumonia 

0 

Lin et al., 2008 

NA 

33 

linezolid 

20 

2005 

cSSTI 

0 


NA 

29 

vancomycin 

20 

2005 

cSSTI 

0 


NA 

38 

linezolid 

21 

2005 

pneumonia 

0 


NA 

40 

vancomycin 

21 

2005 

pneumonia 

0 

Kohno et al., 2007 

NA 

51 

linezolid 

22 

2004 

cSSTI 

0 


NA 

26 

vancomycin 

22 

2004 

cSSTI 

0 


NA 

31 

linezolid 

23 

2004 

pneumonia 

0 


NA 

17 

vancomycin 

23 

2004 

pneumonia 

0 

Florescu et al., 2008 

NA 

70 

tigecycline 

24 

2005 

cSSTI 

0 


NA 

23 

vancomycin 

24 

2005 

cSSTI 

0 

Itani et al., 2010 

223 

276 

linezolid 

25 

2007 

cSSTI 

1 


196 

266 

vancomycin 

25 

2007 

cSSTI 

1 

Wunderink et al., 2008 

NA 

30 

linezolid 

26 

2005 

pneumonia 

1 


NA 

20 

vancomycin 

26 

2005 

pneumonia 

1 

Wunderink et al., 2012 

102 

186 

linezolid 

27 

2010 

pneumonia 

1 


92 

205 

vancomy^i^ 

27 

2010 

pneumonia 

1 



Table 4: Risk ratio estimates, standard errors and 95% confidence intervals for rel¬ 
ative effects of antibiotics telavancin (TEL), linezolid (LIN), and mainstay therapy 
vancomycin (VAN) 

TEL vs VAN LIN vs VAN TEL vs LIN 


Method 

Est 

SE 

95% Cl 

EST 

SE 

95% Cl 




No Adjust 

1.04 

0.028 

(0.99,1.10) 

0.92 

0.027 

(0.87,0.97) 

1.13 

0.045 

(1.05,1.22) 

RE Arm 

1.00 

0.010 

(0.98,1.02) 

1.08 

0.012 

(1.05,1.10) 

0.92 

0.014 

(0.89,0.95) 

G-Comp (RE) 

1.00 

0.003 

(1.00,1.00) 

1.06 

0.006 

(1.06,1.09) 

0.94 

0.005 

(0.92,0.94) 

TMT.F. (RE) 

0.89 

0.106 

(0.75,1.19) 

1.05 

0.012 

(1.03,1.07) 

0.85 

0.102 

(0.71,1.12) 
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